Using x-gram for efficient speech recognition
نویسندگان
چکیده
X-grams are a generalization of the n-grams, where the number of previous conditioning words is different for each case and decided from the training data. X-grams reduce perplexity with respect to trigrams and need less number of parameters. In this paper, the representation of the x-grams using finite state automata is considered. This representation leads to a new model, the non-deterministic x-grams, an approximation that is much more efficient, suffering small degradation on the modeling capability. Empirical experiments for a continuous speech recognition task show how, for each ending word, the number of transitions is reduced from 1222 (the size of the lexicon) to around 66. 1. REVIEW OF X-GRAMS One of the main components in continuous speech recognition is the language model. The model has to estimate the a priori probability of each possible sentence. The most extended models are based on n-grams, basically bigrams or trigrams, which estimate the probability of each word taking into account only the previous n-1 words. Recently, the authors have introduced xgrams [1]. The difference between n-grams and x-grams is that, in x-grams, the length is not fixed by the designer of the language model, but found automatically by the algorithm from the training data. For each sequence of given words, the method decides the number of words which really conditionate the probabilities of the incoming words. To decide if a history is relevant or not, the algorithm uses the number of times that the history occurs in the training data and the divergence function [1]. X-grams are estimated using back-off smoothing. X-grams are more efficient than trigrams in the following sense: x-grams are more compacts (smaller) than trigrams and produce smaller perplexity. This is because some probabilities distributions conditioned by the two previous words do not improve the estimation with respect to the use of only the one previous word. On the other hand, in other cases, the estimation of the probabilities conditioned by the 3, 4 or, in general, the x previous words is significantly better that the estimation considering only the two previous words. To illustrate this, table 1 shows the perplexity and the number of histories considered in trigrams and with x-grams. The task consists in queries to a database with geographical information (rivers, mountains, cities, etc.). The models are trained using 8262 queries (90.000 words) and the perplexity is evaluated using 1147 queries. All the queries in the training and in the test are different. The size of the lexicon is 1222 words. This research was supported by CICYT, contract TIC95-0884-C04-02 Model Perplexity # histories
منابع مشابه
W - a Fast , Memory Efficient One - Pass Stack Decoder
This paper describes features and implementation details of the $N$>$_ decoder, a fast, memory ecient one-pass stack decoder designed for large vocabulary speech recognition with dictionaries 65536 words. The stack decoder design made it possible to use arbitrary backo N-gram language models in the rst pass. A new on-demand N-gram LM-lookahead for the tree lexicon is introduced. Decoding time w...
متن کاملGeneralized Fast On-the-fly Com WFST-Based Speech R
This paper describes a Generalized Fast On-the-fly Composition (GFOC) algorithm for Weighted Finite-State Transducers (WFSTs) in speech recognition. We already proposed the original version of GFOC, which yields fast and memory-efficient decoding using two WFSTs. GFOC enables fast on-the-fly composition of three or more WFSTs during decoding. In many cases, it is actually difficult or impossibl...
متن کاملGeneralized fast on-the-fly composition algorithm for WFST-based speech recognition
This paper describes a Generalized Fast On-the-fly Composition (GFOC) algorithm for Weighted Finite-State Transducers (WFSTs) in speech recognition. We already proposed the original version of GFOC, which yields fast and memory-efficient decoding using two WFSTs. GFOC enables fast on-the-fly composition of three or more WFSTs during decoding. In many cases, it is actually difficult or impossibl...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملNN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition
We present NN-grams, a novel, hybrid language model integrating n-grams and neural networks (NN) for speech recognition. The model takes as input both word histories as well as n-gram counts. Thus, it combines the memorization capacity and scalability of an n-gram model with the generalization ability of neural networks. We report experiments where the model is trained on 26B words. NN-grams ar...
متن کاملSpeech interface for name input based on combination of recognition methods using syllable-based n-gram and word dictionary
We propose an interface for a name input based on speech recognition using syllable-based N-gram and a word dictionary. Name utterance is hard to recognize accurately because of the large vocabulary size, so the system uses continuous syllable recognition with syllable-based N-gram and isolated word recognition with a dictionary containing frequent words. User first utters a name and then choos...
متن کامل